[SPARK-47598][CORE] MLLib: Migrate logError with variables to structured logging framework#45837
[SPARK-47598][CORE] MLLib: Migrate logError with variables to structured logging framework#45837panbingkun wants to merge 9 commits into
Conversation
…red logging framework
| } | ||
|
|
||
| private def withLogContext(context: java.util.HashMap[String, String])(body: => Unit): Unit = { | ||
| protected def withLogContext(context: java.util.HashMap[String, String])(body: => Unit): Unit = { |
There was a problem hiding this comment.
Why change private to protected
because some class extends Logging and override the method logInfo, logWarn, 'logError', eg:
mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala
| /** | ||
| * Logs a LogEntry which message with a prefix that uniquely identifies the training session. | ||
| */ | ||
| override def logError(entry: LogEntry): Unit = { |
There was a problem hiding this comment.
Maybe we can write it as follows:
override def logError(entry: LogEntry): Unit = {
super.logError(MessageWithContext(prefix + entry.message, entry.context))
}
But it seems that the efficiency is not as high as mentioned above.
| val MIN_SIZE = Value | ||
| val REMOTE_ADDRESS = Value | ||
| val POD_ID = Value | ||
| val NUM_ITERATIONS = Value |
| val msg = s"Classification labels should be in [0 to ${numClasses - 1}]. " + | ||
| s"Found $numInvalid invalid labels." | ||
| val msg = log"Classification labels should be in " + | ||
| log"${MDC(RANGE_CLASSIFICATION_LABELS, s"[0 to ${numClasses - 1}]")}. " + |
There was a problem hiding this comment.
I am thinking about making the log keys generic. How about making it RANGE here.
| s"Found $numInvalid invalid labels." | ||
| val msg = log"Classification labels should be in " + | ||
| log"${MDC(RANGE_CLASSIFICATION_LABELS, s"[0 to ${numClasses - 1}]")}. " + | ||
| log"Found ${MDC(NUM_CLASSIFICATION_LABELS, numInvalid)} invalid labels." |
There was a problem hiding this comment.
I am thinking about making the log keys generic. How about making it COUNT here.
|
|
||
| if (rawCoefficients == null) { | ||
| val msg = s"${optimizer.getClass.getName} failed." | ||
| val msg = log"${MDC(OPTIMIZER_CLASS_NAME, optimizer.getClass.getName)} failed." |
There was a problem hiding this comment.
There are multiple duplicated codes in the changes of this PR. Let's create a method to reduce duplications.
There was a problem hiding this comment.
Okay, very good suggestion
| case e: java.lang.AssertionError => | ||
| logError(s"FAILED for numIterations=$numIterations, learningRate=$learningRate," + | ||
| s" subsamplingRate=$subsamplingRate") | ||
| logError(log"FAILED for numIterations=${MDC(NUM_ITERATIONS, numIterations)}, " + |
There was a problem hiding this comment.
Let's create a method to reduce duplicated code in this file.
|
@gengliangwang all done. |
|
Thanks, merging to master |
What changes were proposed in this pull request?
The pr aims to migrate
logErrorin moduleMLLibwith variables tostructured logging framework.Why are the changes needed?
To enhance Apache Spark's logging system by implementing structured logging.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?
No.